Skip to content

Conversation

@rainj-me
Copy link
Collaborator

@rainj-me rainj-me commented Aug 22, 2025

Motivation

#9490

  • Support cuda130 with custom flashinfer and trtllm kernel Aug 25 2025
  • Support sm_110 and sm_121 on cuda 130
  • Support --compress-mode=size on cuda 130
  • Keep sm_101 support on cuda 128/129

Test

  • Step 1, use nvidia pytorch 25.08 image
docker pull nvcr.io/nvidia/pytorch:25.08-py3
  • Step 2, run container with bash
  • Step 3, git clone the change
  • Step 4, comment out the torch dependency in sgl-kernel/pyproject.toml, patch like
diff --git a/sgl-kernel/pyproject.toml b/sgl-kernel/pyproject.toml
index 52ee620e4..177e49e57 100644
--- a/sgl-kernel/pyproject.toml
+++ b/sgl-kernel/pyproject.toml
@@ -1,7 +1,7 @@
 [build-system]
 requires = [
   "scikit-build-core>=0.10",
-  "torch>=2.8.0",
+  # "torch>=2.8.0",
   "wheel",
 ]
  • Step 5, patch the python/pyproject.toml with the torch version from the container, in my container the patch is like
diff --git a/python/pyproject.toml b/python/pyproject.toml
index c23efbc2e..b29789d45 100644
--- a/python/pyproject.toml
+++ b/python/pyproject.toml
@@ -49,7 +49,7 @@ runtime_common = [
     "scipy",
     "timm==1.0.16",
     "tiktoken",
-    "torchao==0.9.0",
+    "torchao==0.12.0+git",
     "transformers==4.55.2",
     "uvicorn",
     "uvloop",
@@ -59,21 +59,19 @@ runtime_common = [
 srt = [
     "sglang[runtime_common]",
     "sgl-kernel==0.3.5",
-    "torch==2.8.0",
-    "torchaudio==2.8.0",
+    "torch==2.8.0a0+34c6371d24.nv25.8",
     "torchvision",
     "cuda-python",
-    "flashinfer_python==0.2.11.post3",
+    "flashinfer_python==0.2.14.post1",
 ]
 
 blackwell = [
     "sglang[runtime_common]",
     "sgl-kernel",
-    "torch==2.8.0",
-    "torchaudio==2.8.0",
+    "torch==2.8.0a0+34c6371d24.nv25.8",
     "torchvision",
     "cuda-python",
-    "flashinfer_python==0.2.11.post3",
+    "flashinfer_python==0.2.14.post1",
 ]
  • Step 6, install sgl-kernel
CUDA_VERSION=13.0 CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)" SKBUILD_BUILD_DIR=./build CMAKE_ARGS="-DCMAKE_POLICY_VERSION_MINIMUM=3.5"  pip install -v .

Modifications

  • use custom flashinfer to support cuda130 and load trtllm kernel Aug 21 2025
  • fix cub::Sum cub::Max issue and let it support both cuda12x and 130
  • use torch 2.8.x and cuda 130

Accuracy Tests

Benchmarking and Profiling

Checklist

@rainj-me rainj-me force-pushed the dev/support_cuda130 branch from 0621958 to e85f927 Compare August 26, 2025 02:55
@rainj-me rainj-me marked this pull request as ready for review August 26, 2025 02:56
@rainj-me rainj-me changed the title support cuda 13.0 and trtllm kernel by Aug 21 2025 support cuda 13.0 and trtllm kernel by Aug 25 2025 Aug 26, 2025
@FlamingoPg FlamingoPg self-assigned this Aug 26, 2025
@rainj-me
Copy link
Collaborator Author

Screenshot 2025-08-26 at 10 39 54 AM
  • sgl_kernel-0.3.6.post2-cp310-abi3-linux_x86_64_1.whl with --compress-mode=size flag, wheel size 322MB
  • sgl_kernel-0.3.6.post2-cp310-abi3-linux_x86_64.whl without flag, wheel size 379MB

@johnnynunez
Copy link
Contributor

johnnynunez commented Aug 26, 2025

@zhyncs @rainj-me @faradawn LGTM

Copy link
Collaborator

@FlamingoPg FlamingoPg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, wait for CI

@rainj-me rainj-me merged commit 79e6a8a into sgl-project:main Aug 27, 2025
21 of 54 checks passed
@rainj-me
Copy link
Collaborator Author

Followup: #9680

@rainj-me rainj-me changed the title support cuda 13.0 and trtllm kernel by Aug 25 2025 support cuda 13.0 and trtllm kernel Aug 27, 2025
@voipmonitor
Copy link
Contributor

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error

/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory

(using the same procedure you wrote)

@zhyncs
Copy link
Member

zhyncs commented Aug 27, 2025

This break the latest build on b200 cu128, I'll revert this first. @rainj-me

nvcc fatal   : Unsupported gpu architecture 'compute_103'

@johnnynunez
Copy link
Contributor

johnnynunez commented Aug 27, 2025

This break the latest build on b200 cu128, I'll revert this first. @rainj-me

nvcc fatal   : Unsupported gpu architecture 'compute_103'

This must be in cuda 13, it is gb300

@rainj-me
Copy link
Collaborator Author

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error

/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory

(using the same procedure you wrote)

try to use the following command to build

CUDA_VERSION=13.0 CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)" SKBUILD_BUILD_DIR=./build CMAKE_ARGS="-DCMAKE_POLICY_VERSION_MINIMUM=3.5"  pip install -v .

@johnnynunez
Copy link
Contributor

johnnynunez commented Aug 27, 2025

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error

/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory

(using the same procedure you wrote)

this errors occurs because you are not pointing well to the correct path:
for example for cuda 13 in sbsa:

export CPLUS_INCLUDE_PATH=/usr/local/cuda-13.0/targets/sbsa-linux/include/cccl

that resolves the problem for me in gh200.

image

https://developer.nvidia.com/blog/whats-new-and-important-in-cuda-toolkit-13-0/

@voipmonitor
Copy link
Contributor

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error
/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory
(using the same procedure you wrote)

try to use the following command to build

CUDA_VERSION=13.0 CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)" SKBUILD_BUILD_DIR=./build CMAKE_ARGS="-DCMAKE_POLICY_VERSION_MINIMUM=3.5"  pip install -v .

thank you - is it worth to try trt llm kernels if I have sm120 architecture (which is RTX PRO 6000 ) FP8 blockwise mostly or compressed FP8 scale

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants